由于经过验证的2D检测技术的适用性,大多数当前点云检测器都广泛采用了鸟类视图(BEV)。但是,现有方法通过简单地沿高度尺寸折叠的体素或点特征来获得BEV特征,从而导致3D空间信息的重丢失。为了减轻信息丢失,我们提出了一个基于多级特征降低降低策略的新颖点云检测网络,称为MDRNET。在MDRNET中,空间感知的维度降低(SDR)旨在在体素至BEV特征转换过程中动态关注对象的宝贵部分。此外,提出了多级空间残差(MSR),以融合BEV特征图中的多级空间信息。关于Nuscenes的广泛实验表明,该提出的方法的表现优于最新方法。该代码将在出版时提供。
translated by 谷歌翻译
来自单眼图像的3D对象检测是计算机视觉的具有挑战性且长期存在的问题。为了从不同的角度组合信息而没有麻烦的2D实例跟踪,最近的方法倾向于通过在空间中密集的常规3D网格进行采样,这是效率低下的多视图。在本文中,我们试图通过提出可学习的关键点采样方法来改善多视图特征聚合,该方法将伪表面点散布在3D空间中,以保持数据稀疏性。然后使用多视图几何约束和视觉特征增强的分散点来推断场景中的对象位置和形状。为了明确地弥补单帧和模型多视图几何形状的局限性,我们进一步提出了一个表面滤波器模块以抑制噪声。实验结果表明,就3D检测而言,我们的方法的性能明显优于以前的作品(在某些类别的扫描仪上改善了0.1 AP)。该代码将公开可用。
translated by 谷歌翻译
本地图像功能匹配,旨在识别图像对的识别和相应的相似区域,是计算机视觉中的重要概念。大多数现有的图像匹配方法遵循一对一的分配原则,并采用共同最近的邻居来确保跨图像之间本地特征之间的独特对应关系。但是,来自不同条件的图像可能会容纳大规模变化或观点多样性,以便一对一的分配可能在密集匹配中导致模棱两可或丢失的表示形式。在本文中,我们介绍了一种新颖的无探测器本地特征匹配方法Adamatcher,该方法首先通过轻巧的特征交互模块与密集的特征相关联,并估算了配对图像的可见面积,然后执行贴片级多到 - 一个分配可以预测匹配建议,并最终根据一对一的完善模块进行完善。广泛的实验表明,Adamatcher的表现优于固体基线,并在许多下游任务上实现最先进的结果。此外,多对一分配和一对一的完善模块可以用作其他匹配方法(例如Superglue)的改进网络,以进一步提高其性能。代码将在出版时提供。
translated by 谷歌翻译
基于学习的多视图立体声(MVS)方法取得了令人印象深刻的进步,并且近年来超越了传统方法。但是,它们的准确性和完整性仍在挣扎。在本文中,我们提出了一种新方法,以增强受对比度学习和功能匹配启发的现有网络的性能。首先,我们提出了一个对比匹配损失(CML),该损失将正确的匹配点视为正样品,将正确的匹配点视为正样本,并将其他点视为阴性样本,并根据特征的相似性计算对比度损失。我们进一步提出了一个加权局灶性损失(WFL),以提高分类能力,从而削弱了根据预测的置信度,在不重要的区域中低信任像素对损失的贡献。在DTU,坦克和寺庙和混合MVS数据集上进行的广泛实验表明,我们的方法可实现最先进的性能,并在基线网络上取得了重大改进。
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
Supervised Question Answering systems (QA systems) rely on domain-specific human-labeled data for training. Unsupervised QA systems generate their own question-answer training pairs, typically using secondary knowledge sources to achieve this outcome. Our approach (called PIE-QG) uses Open Information Extraction (OpenIE) to generate synthetic training questions from paraphrased passages and uses the question-answer pairs as training data for a language model for a state-of-the-art QA system based on BERT. Triples in the form of <subject, predicate, object> are extracted from each passage, and questions are formed with subjects (or objects) and predicates while objects (or subjects) are considered as answers. Experimenting on five extractive QA datasets demonstrates that our technique achieves on-par performance with existing state-of-the-art QA systems with the benefit of being trained on an order of magnitude fewer documents and without any recourse to external reference data sources.
translated by 谷歌翻译
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译
Knowledge graph embedding (KGE), which maps entities and relations in a knowledge graph into continuous vector spaces, has achieved great success in predicting missing links in knowledge graphs. However, knowledge graphs often contain incomplete triples that are difficult to inductively infer by KGEs. To address this challenge, we resort to analogical inference and propose a novel and general self-supervised framework AnKGE to enhance KGE models with analogical inference capability. We propose an analogical object retriever that retrieves appropriate analogical objects from entity-level, relation-level, and triple-level. And in AnKGE, we train an analogy function for each level of analogical inference with the original element embedding from a well-trained KGE model as input, which outputs the analogical object embedding. In order to combine inductive inference capability from the original KGE model and analogical inference capability enhanced by AnKGE, we interpolate the analogy score with the base model score and introduce the adaptive weights in the score function for prediction. Through extensive experiments on FB15k-237 and WN18RR datasets, we show that AnKGE achieves competitive results on link prediction task and well performs analogical inference.
translated by 谷歌翻译
Digital engineering transformation is a crucial process for the engineering paradigm shifts in the fourth industrial revolution (4IR), and artificial intelligence (AI) is a critical enabling technology in digital engineering transformation. This article discusses the following research questions: What are the fundamental changes in the 4IR? More specifically, what are the fundamental changes in engineering? What is digital engineering? What are the main uncertainties there? What is trustworthy AI? Why is it important today? What are emerging engineering paradigm shifts in the 4IR? What is the relationship between the data-intensive paradigm and digital engineering transformation? What should we do for digitalization? From investigating the pattern of industrial revolutions, this article argues that ubiquitous machine intelligence (uMI) is the defining power brought by the 4IR. Digitalization is a condition to leverage ubiquitous machine intelligence. Digital engineering transformation towards Industry 4.0 has three essential building blocks: digitalization of engineering, leveraging ubiquitous machine intelligence, and building digital trust and security. The engineering design community at large is facing an excellent opportunity to bring the new capabilities of ubiquitous machine intelligence and trustworthy AI principles, as well as digital trust, together in various engineering systems design to ensure the trustworthiness of systems in Industry 4.0.
translated by 谷歌翻译
Surgical robot automation has attracted increasing research interest over the past decade, expecting its huge potential to benefit surgeons, nurses and patients. Recently, the learning paradigm of embodied AI has demonstrated promising ability to learn good control policies for various complex tasks, where embodied AI simulators play an essential role to facilitate relevant researchers. However, existing open-sourced simulators for surgical robot are still not sufficiently supporting human interactions through physical input devices, which further limits effective investigations on how human demonstrations would affect policy learning. In this paper, we study human-in-the-loop embodied intelligence with a new interactive simulation platform for surgical robot learning. Specifically, we establish our platform based on our previously released SurRoL simulator with several new features co-developed to allow high-quality human interaction via an input device. With these, we further propose to collect human demonstrations and imitate the action patterns to achieve more effective policy learning. We showcase the improvement of our simulation environment with the designed new features and tasks, and validate state-of-the-art reinforcement learning algorithms using the interactive environment. Promising results are obtained, with which we hope to pave the way for future research on surgical embodied intelligence. Our platform is released and will be continuously updated in the website: https://med-air.github.io/SurRoL/
translated by 谷歌翻译